IMS Lecture Notes— Monograph Series 

Asymptotics: Particles, Processes and Inverse Problems 

Vo l. 55 (2007) 101-107 

© Institute of Mathematical Statistics 2007 
DOI: |10.12 14/074921707000000 292l 

Marshall's lemma for convex density 
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Abstract: Marshall's [Nonparametric Techniques in Statistical Inference 
(1970) 174-176] lemma is an analytical result which implies y'n-consistency 
of the distribution function corresponding to the Grenander [Skand. Aktuari- 
etidskr. 39 (1956) 125-153] estimator of a non-decreasing probability density. 
The present paper derives analogous results for the setting of convex densities 
on [0, oo). 



1. Introduction 

Let F be the empirical distribution function of independent random variables X\, 
X2, . . . , X n with distribution function F and density / on the halfline [0, 00). Vari- 
ous shape restrictions on / enable consistent nonparametric estimation of it without 
any tuning parameters (e.g. bandwidths for kernel estimators). 

The oldest and most famous example is the Grenander estimator / of / under 
the assumption that / is non-increasing. Denoting the family of all such densities by 
J 7 , the Grenander estimator may be viewed as the maximum likelihood estimator, 



/ = argmax| / log h dW : h E , 
or as a least squares estimator, 

cf. Robertson et al. Note that if F had a square- integrable density F', then the 
preceding argmin would be identical with the minimizer of J^°(h — F')(x) 2 dx over 
all non-increasing probability densities h on [0, 00). 

A nice property of / is that the corresponding distribution function F, 



F(r) := / f(x)dx, 
Jo 

is automatically y^-consistent. More precisely, since F is the least concave ma jo- 
rant of F, it follows from Marshall's '4J lemma that 

WF-FWcc < HF-FHoo. 

A more refined asymptotic analysis of F — F has been provided by Kiefer and 
Wolfowitz 0. 
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2. Convex densities 

Now we switch to the estimation of a convex probability density / on [0,oo). As 
pointed out by Groeneboom et al. the nonparametric maximum likelihood esti- 
mator f m i and the least squares estimator fi s are both well-defined and unique, but 
they are not identical in general. Let K. denote the convex cone of all convex and 
integrable functions g on [0, oo). (All functions within K, are necessarily nonnegativc 
and non-increasing.) Then 

f m i — argmaxl / log hd¥ — / h(x) dx J , 

fl s = argminf / h(x) 2 dx — 2 / hd¥j. 

Both estimators have the following property: 

Proposition 1. Let f be either f m i or fi s . Then f is piecewise linear with 

• at most one knot in each of the intervals (Xu\, Xu + -n), 1 < i < n, 

• no knot at any observation Xi, and 

• precisely one knot within (Xt n -\,oo). 

The estimators f m i, fi s and their distribution functions F m i, Fi s are completely 
characterized by Proposition [1] and the next proposition. 

Proposition 2. Let A be any function on [0, oo) such that f m \ + tA G K, for some 
t > 0. Then 

A f 
— d¥< / A(x) dx. 

fml J 

Similarly, let A be any function on [0, oo) such that fi s + tA G K, for some t > 0. 
Then 

AdF < J AdF is . 

In what follows we derive two inequalities relating F — F and ¥ — F, where F 
stands for F m i or F[ s : 

Theorem 1. 

(1) inf (F ml - F) > | inf (F - F) - \ sup (F - F), 

[0,oo) Z [0,oo) Z [ 0)OO ) 

(2) ||fi s -F|| <2||F-F|| . 

> ' II l " Moo — II Moo 

Both results rely on the following lemma: 

Lemma 1. Let F,F be continuous functions on a compact interval [a,b], and let 
¥ be a bounded, measurable function on [a, b] . Suppose that the following additional 
assumptions are satisfied: 

(3) F(a) = ¥(a) and F(b) = ¥(b), 

(4) F has a linear derivative on (a, b), 

(5) F has a convex derivative on (a,b), 

i>b c-b 

(6) / F{y)dy < / ¥(y) dy for all r G [a,b]. 
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Then 

sup (F-F) < ^ SU p(F~F)-i(F-F)(6). 

[0,6] 1 [0,6] ^ 

J/ condition (OJ) is replaced with 

pr pr 

(7) / F(:k)<2:e > / F(:c)efe /or aiZ r G [a, 6], 

inf (F-F) > |inf '(F-F)-±(F-F)(a). 

[a,b[ z [a, 6] Z 

The constants 3/2 and 1/2 are sharp. For let [a,b] = [0, 1] and define 

_ f x 2 - c for x > e, 
W -~ \(x/e)(e 2 -c)forx<e, 

F(x) :=0, 

F(x) := 1{0 < x < l}(x 2 - 1/3) 

for some constant c > 1 and some small number e S (0, 1/2]. One easily verifies 
conditions j3])-((6]). Moreover, 

sup(F - F) = c- e 2 , sup (F — F) = c — 1/3 and (F — F)(l) = c — 1. 
[0,1] [0,1] 

Hence the upper bound (3/2) sup(F — F) — (1/2)(F - F)(l) equals sup(F - F) + 
e 2 for any c > 1. Note the discontinuity of F at and 1. However, by suitable 
approximation of F with continuous functions one can easily show that the constants 
remain optimal even under the additional constraint of F being continuous. 

Proof of Lemma[I[ We define G := F — F with derivative g := G' on (a,b). It 
follows from ^ that 

maxG - max (F-F) < § sup (F - F) - ±(F - F)(6). 

{a,b} {a,b} Z [„ )6 ] Z 

Therefore it suffices to consider the case that G attains its maximum at some point 
r E (a, b). In particular, g(r) = 0. We introduce an auxiliary linear function g on 
[r, b] such that g(r) = and 

g(v)dy = [ g{y)dy = G{b)~G{r). 



Note that g is concave on (a, b) by (H])-©. Hence there exists a number y Q E (r, b) 
such that 

> Oon [r,y ], 
< Oon [y ,b). 

This entails that 



9-9 



(g~g)( u )du = - (g-g)(u)du > for any y E [r, b] . 
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G(y) = G(r) 



g{u) du 



>G(r) + / g(u)du 

J r 

(y-r) 2 



= G(r) + 



(b - rf 



[G{b)-G{r% 



so that 



G(y)dy>(b~r)G(r) + 



G(b) - G(r) 
(b - r) 2 



(y - rf dy 



(6_r)[|G(r) + i(?(6) 
(6-r)[|G(r) + i(F-J')(6) 



On the other hand, by assumption ([6]), 



G(y)dy < I (¥-F)(y)dy < (b — r) sup (F — F). 

[a.b] 



This entails that 



G(r) < I sup (¥-F)-hw-F)(b). 

1 [a,b] Z 



If ((6]) is replaced with ((7]), then note first that 

3 1 

min G = min (F - F) > - min (F - F) - -(¥ - F)(a). 

{a,b} {a,b\ 2 {a,b} 2 

Therefore it suffices to consider the case that G attains its minimum at some point 
r G (a, b). Now we consider a linear function g on [a, r] such that g(r) — and 



g{x) dx 



g(x)dx = G(r)-G(a). 



Here concavity of g on (a, b) entails that 



(g-g)(u) du 



(g — g)(u) du < for any x £ [a, r] 



so that 



G(x) = G(r) 



< G(r) - / 

J X 

(r ~ x) 2 



G(r) 



(r - a) : 



[G(r) - G(a)]. 



Consequently, 



G(x) dx<(r- o)G(r) 



G(r) - G(o) /"" 
(r — a) 2 

(r-a)^G(r) + i(F-F)(a)' 



(r — x) 2 cfa; 
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whereas 



G(x)dx > I (¥-F)(x)dx > (r — a) inf (F — F), 

[a,b] 



by assumption 0. This leads to 

G(r) > | inf (F-F)-l(¥-F)(a). □ 

A [a,b\ Z 

□ 

Proof of Theorem^ Let =: to < t-y < ■ ■ ■ < t m be the knots of /, including the 
origin. In what follows we derive conditions (J3j) (J5j) and (J6j/[7|) of Lemma Q] for any 
interval [a, b] — [tk,tk+i] with < k < m. For the reader's convenience we rely 
entirely on Proposition [2] In case of the least squares estimator, similar inequalities 
and arguments may be found in Groeneboom et al. [2| ■ 

Let < e < mmi<i< m {ti — t^— i) /2. For a fixed k G {1, . . . , m} we define Ai 
to be continuous and piecewise linear with knots tk-\ — e (if k > 1), £fc_i, tk and 
t k + e. Namely, let Ai(x) = for x ^ (tfc-i — e, t k + e) and 

Ai(x) := ( f ml } x) * { = l ml } for x e [t k -i,t k ]. 

I 1 lf / = lis J 

This function Ai satisfies the requirements of Proposition O Letting e \ 0, the 
function Ai(x) converges pointwise to 

l{tfc-i < a; < t k }f m i(x) if / = fml, 
l{t k -i<x<t k } if/ = // s , 

and the latter proposition yields the inequality 

F(t fc )-F(t fc _i) < Fit^-FiU-x). 

Similarly let A2 be continuous and piecewise linear with knots at t k —i, ifc-i + e, 
tk — € and tfc. Precisely, let A 2 (x) := for x ^ (tk-i,tk) and 

A 2 (x) := {"^f^;^} farie[t fc _i + e,* fc - e ]. 
The limit of A 2 (x) as e \ equals 

-l{tfc-l < X < t k }frnl{x) if / = /mi, 

— l{i fc -i < x < i fc } if / = 

and it follows from Proposition [2] that 

F(tfc)-F(t fc _i) > F{t k ) - F{t k -x). 

This shows that F(t fe )-F(tfc_i) = F(t k )-F(t k -i) for fc = 1, . . . ,m. Since F(0) = 0, 
one can rewrite this as 

(8) F(i fe ) = F(t fc ) for A; = 0,1,..., m. 

Now we consider first the maximum likelihood estimator f m i. For < k < m 
and r £ (tk,tk+i] let A(x) :— for x £ (tk — e,r), let A be linear on [tk — e,tfc], 
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and let A(x) := (r — x)f m i(x) for x <G One easily verifies, that this function 

A satisfies the conditions of Proposition [2] too, and with e \ we obtain the 
inequality 

pT pt 

(r-x)¥(dx) < / (r-x)F(dx). 

Integration by parts (or Fubini's theorem) shows that the latter inequality is equiv- 
alent to 



(¥(x)-¥(t k ))dx < / (F(x)-F(t k ))dx. 
Since ¥(tk) = F(tk), we end up with 

pr pr 

I ¥(x)dx < / F(x)dx for k = 0, 1, . . . , m — 1 and r <E (tk, <fe+i]- 
Jt k Jt k 

Hence we may apply Lemma [T] and obtain (JTJ) • 

Finally, let us consider the least squares estimator /; s . For < k < m and 
T € (tk,tk+i] let A(x) := for x ^ (tk — e,r), let A be linear on [tk — e, tk] as well 
as on [tk,r] with A(tk) '■= r — tk- Then applying Proposition [3 and letting e \ 
yields 



(r-x)¥(dx) < / (r-x)F(dx), 



so that 



/ ¥(x)dx < / F(x)dx for k = 0, 1, . . . ,m — 1 and r € (ifc,ifc+i]- 
Jt k Jt k 

Thus it follows from Lemma [T] that 

inf (F - F) > - inf (F - F) - - sup (F - F) > -2 ||F - F\\ . 

[0,oo) 2 [0,oo) 2 [0jOo) h 1100 



Alternatively, for 1 < k < m and r G [tfe-i,tfc) let A(x) :— for x $ (r,% + e), 
let A be linear on [r,tk] as well as on [tk,tk + e] with A(^) := —(tk — r). Then 
applying Proposition [2] and letting e \ yields 

*(t k -x)F(da;) > / * (t k -x)F(dx), 

J r Jr 

so that 

f tk r - 

/ ¥(x)dx > / F(x)dx for fc = 1, 2, . . . , m and r € [tfc-i, tfc)- 

Hence it follows from Lemma [T] that 

sup (F - F) < | sup (F - F) - \ inf (F - F) < 2 ||F - F\\ . 

[0,oo) 2 [0,oo) 2 I ' 00 ) □ 
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